[v10] Add character count countFunction option#1897
Conversation
countFunction optioncountFunction option
5562f44 to
c3c6fc7
Compare
272d552 to
551bf35
Compare
| * @satisfies {Record<string, (text: string, segmenter?: Intl.Segmenter | null) => number>} | ||
| */ | ||
| static countFunctions = Object.freeze({ | ||
| characters(text, segmenter) { |
There was a problem hiding this comment.
Some thoughts on the interface here...
Currently there are two things that influence the choice of algorithm: the selection of countFunction, and whether segmenter is set or not. Wondering if it would be simpler to have explicit functions for graphemes as well as characters and words? In that case passing countType: "graphemes" would be syntactic sugar for passing countFunction: countFunctions.graphemes and the same goes for "characters" and "words".
It also seems slightly inconsistent that if you pass countType: "words" with a custom function, we pass through a segmenter that can do word segmentation, but if you don't pass the custom function, it doesn't use the segmenter.
Maybe if the interface is simplified to fn(text) the segmenter could be an internal implementation detail?
There was a problem hiding this comment.
Yeah, I partly did this because Intl.Segmenter also supports granularity: "word" 😮
So instead of countType: "words" we could keep maxwords but set granularity: "word"
I'll see what feedback comes from the GOV.UK Design System team
Maybe if the interface is simplified to fn(text) the segmenter could be an internal implementation detail?
Yeah, I'm glad you said. I reckon for now let's ditch the static property and keep it all internal, especially with a potential performance cost to creating a new segmenter on every keystroke (hence injecting it in)
There was a problem hiding this comment.
Feedback applied, thanks @MatMoore
Regarding word counts via the segmenter, I've messaged @romaricpascal and added a comment as the Unicode Default Word Boundary Specification varies quite significantly from /\S+/g
For example, consider this phrase:
My mother-in-law—Wait, what?
It matches only 3 words currently:
["My", "mother-in-law—Wait,", "what?"]Yet using the segmenter with granularity: "word" it matches 6 words:
["My", "mother", "in", "law", "Wait", "what"]There was a problem hiding this comment.
Yeah, I'm glad you said. I reckon for now let's ditch the static property and keep it all internal, especially with a potential performance cost to creating a new segmenter on every keystroke (hence injecting it in)
Yeah I figured we wouldn't want to instantiate one each call. The way you've done it is what I was imagining.
That unicode word boundary spec seems like a cleaner way to do word counting if we were implementing from scratch, but I guess it's a potentially breaking change so maybe it's not worth it if we're keeping maxwords. Are we going to wait for the govuk implementation before we merge this?
There was a problem hiding this comment.
Thanks. Yeah it'll be discussed at their next dev catch up
What we know is that they've proposed for maxwords to be deprecated
We could take the countType option as an opt-in to use segmenter. But as this comment mentions that might mean upgrading /\S+/g to a better word boundary regex for older browsers that lack support
There was a problem hiding this comment.
They've not had chance to discuss anything character count related other than this comment:
But that applies to #1899 primarily
For this PR all count functions now use the same interface so it's ready for another review
551bf35 to
921cd37
Compare
921cd37 to
6b80700
Compare
6b80700 to
07676d2
Compare
c3c6fc7 to
e1a7744
Compare
07676d2 to
649b198
Compare
e1a7744 to
9ac9e9b
Compare
649b198 to
e76c2e7
Compare
9ac9e9b to
93515cf
Compare
e76c2e7 to
6c80311
Compare
6c80311 to
3290d27
Compare
93515cf to
906201e
Compare
3290d27 to
9492e12
Compare
eb6e5ba to
4365f61
Compare
9492e12 to
e93b9da
Compare
c7b7bbe to
1da2682
Compare
e93b9da to
078f07a
Compare
078f07a to
7b1a233
Compare
|
|
@anandamaryon1 I'll be adding a changelog entry into #1899 to wrap up this stack of work:
With this initial PR used to uplift some more recent GOV.UK Frontend changes: Plus this PR to allow components to configure functions within their options: We've contributed our improvements back to GOV.UK Frontend via: |



Description
This PR adds a new
countFunctionoption to character counts to cater for server-side differences in:\nversus\r\nFor example, services might already count multi-byte strings server-side (e.g.
len()in Python) resulting in client-side count mismatches, yet upcoming support for graphemecountTypeis blocked by a 3rd party library integrationSupport for a custom
countFunctionmeans teams can close this support gapChecklist